Compensation of small data with large filters for accurate liver vessel segmentation from contrast-enhanced CT images

Background Segmenting liver vessels from contrast-enhanced computed tomography images is essential for diagnosing liver diseases, planning surgeries and delivering radiotherapy. Nevertheless, identifying vessels is a challenging task due to the tiny cross-sectional areas occupied by vessels, which has posed great challenges for vessel segmentation, such as limited features to be learned and difficult to construct high-quality as well as large-volume data. Methods We present an approach that only requires a few labeled vessels but delivers significantly improved results. Our model starts with vessel enhancement by fading out liver intensity and generates candidate vessels by a classifier fed with a large number of image filters. Afterwards, the initial segmentation is refined using Markov random fields. Results In experiments on the well-known dataset 3D-IRCADb, the averaged Dice coefficient is lifted to 0.63, and the mean sensitivity is increased to 0.71. These results are significantly better than those obtained from existing machine-learning approaches and comparable to those generated from deep-learning models. Conclusion Sophisticated integration of a large number of filters is able to pinpoint effective features from liver images that are sufficient to distinguish vessels from other liver tissues under a scarcity of large-volume labeled data. The study can shed light on medical image segmentation, especially for those without sufficient data.

Hence, deep learning-based approaches have been intensively explored and exploited to overcome these constraints because of their automatic feature learning characteristics.These approaches include convolutional neural network-based [14][15][16], recurrent neural networkbased [17], a mixture of convolution and recurrent neural works [18], and integration of deep neural networks with conventional machine learning techniques [19,20].These deep learning-based models manifest remarkable improvement compared with the traditional approaches.However, they require large volumes of manually delineated images containing vessels.Unfortunately, delineating vessel masks with high fidelity is prohibitively difficult and time-consuming.The main obstacles preventing this goal are small size, irregular shape, low contrast and heavy noise; cf.Fig. 1.Hence, developing a model-driven but not data-starved approach is still very promising.
To this end, we develop a new computational model that borrows a large number of existing renowned image filters to distinguish vessels from other tissues and then use XGBoost [21] to classify each pixel as vessels or others.Finally, a refined Markov random field integrates neighborhood information to polish the results.Experimental results carried out on a widely used dataset 3D-IRCADb [22] show that our newly proposed model outperforms all existing traditional machine learning models, even better than deep learning-based models in most cases.Our model only requires a small number of labeled images to train the model but yields competitive or better results.The success reveals that many filters can compensate for the shortage of labeled data, which can be inspiring and promising for those tasks where high-quality data is challenging to obtain.

Methods
The proposed liver vessel segmentation model composes of three modules: vessel enhancement, candidate generation, and segmentation refinement; see Fig. 2. The details are as follows.

Vessel enhancement
Two procedures are applied to the raw images to enhance the edges between vessel areas and other liver tissues, including calibration and contrast.
Calibration is necessary as the raw image may need to be clipped to the appropriate window for vessel analysis.To this end, we automatically determine the window center and width by a statistical approach.Precisely, the mean µ and standard deviation σ of vessel intensities are deter- mined.Then, the intensities of all images are clipped into an interval [µ − 3σ , µ + 3σ ] .These clipped intensities are further normalized to alleviate the systematic bias between different imaging devices by where f(x, y) is the initial intensity of an image at position (x, y), α and c are used to transform the normalized val- ues into gray scales from 0 to 255.
After calibration, the vessels are enhanced by Fig. 1 Vessel segmentation.The first row is the original images, while the second is the vessel masks obtained from 3D-IRCADb [22] where k(x, y) is a kernel of a low-pass filter, controls its magnitude, and ⊛ means convolution.This opera- tion helps wash out many liver tissues and makes vessels stand out.

Context-aware vessel identification
Based on the filters, each pixel is represented by a d-dimensional vector containing its original intensity as well as all the values generated by the filters.Hence, the context as well as the vessel regions can be represented by a n × d vector with n the number of neighbors sur- rounding the interested pixel to be classified.
, where i, j and k are the indices of a pixel, i and j are used to locate the pixel in a slice, and k is used to locate the slice in a volume.The h is set to 1, 2 and 3, resulting in the voxel size of 3 × 3 × 3 , 5 × 5 × 5 and 7 × 7 × 7 , respectively.For the 2D situation, only i and j are considered.
The interested pixel as well as its neighbors form a voxel whose features are obtained from its constituent pixels, where its label is the mask of the central pixel.The features are obtained by using the above filters.The final features of the voxel are input into XGBoost [21] for feature selection and pixel classification.

Segmentation refinement
The vessel segmentation is further refined by a Markov random field (MRF) [35] as the classification is only conducted on pixel level that ignores the correlation between pixels.
An MRF is a graph having G = (V , E) , where V is the set of nodes (e.g., the pixels of an image), and E is the edges connecting the nodes in V (e.g., the adjacency pixels).For a random variable v i in G, the probability of P(V = v i ) is independent of other variables given its neighbors N (v i ) that is named as the Markov blanket.That being said, Based on the Hammersley-Clifford theorem [36], it can be expressed as where E(•) is an energy function and Z is the partition function computed by Z = v i E(v i ) .In this study E(v i ) is calculated by where u i is the refined value of the variable v i and ρ(x, σ ) is the Lorentzian function [37] defined by By minimizing the energy function E, we obtained the refined segmentation of the vessels based on the pixelwise classification results.

Datasets
The well-known dataset 3D-IRCADb [22], scanned using contrast-enhanced computed tomography, is adopted for our model training and validation.In this dataset, all the masks of the liver, hepatic veins, portal veins, and arteries are available.Since 3D-IRCADb only contains 20 volumes (2,823 slices), it is suitable for traditional machine learning approaches but not deep learning-based models.It is because computational models should be trained in cases instead of slices so that training bias can be largely avoided.Thus, we will not make head-to-head comparisons with the deep learning models because of overfitting.

Evaluation metrics
Four metrics are used to evaluate the performance, i.e., accuracy (Acc), sensitivity (Sen), Specificity (Spe), and dice similarity coefficient (DSC).They are defined as where true positives (TP) are vessel pixels classified correctly, false positives (FP) are pixels classified as vessels incorrectly, true negatives (TN) are pixels classified as non-vessels correctly, and false negatives (FN) are vessel pixels classified incorrectly.Among them, DSC is more meaningful as it is robust to imbalanced labels that are very common in vessel data.

Performance qualification Performance on 3D-IRCADb
The performance of our model is evaluated through a rigorous five-fold cross-validation process.The dataset The filters and operators that are used to transform CT images CL(•) is a contrast limited function, g(x, y) is the function to be convoluted to the image matrix I with x and y the distance between the current location and the interest point (i, j), I ′ (i, j) is the manipulated intensity of the original intensity I(i, j), K is convolution kernel, H(•) is Hessian matrix, J is Jacobian matrix, r is an intensity smoothing function, g s is a coordinate smoothing function, (f * g) means the convolution operation between f and g, w and h are the kernel width and height, γ , σ , and ψ are parameters

Operator
Definition Gabor [24] g(x, y) Hessian [7] H(I(i, j)) = J(∇I(i, j)) Canny [33] TrackEdge(DoubleThreshold(GradientSuppression(Gradient(Smooth(I))))) Pillow [34] Predefined in the imageFilter module of the package is partitioned into five folds at the scan level, with four folds (16 scans) designated for training and the remaining fold (4 scans) for testing.The training and testing are iterated across all folds to ensure comprehensive evaluation of all scans independently.On average, the DSC is 0.63 for all the volumes in 3D-IRCADb.However, this score is rarely reported by others.In addition, only partial volumes with top-performed results are reported by others as well.Therefore, we present the results obtained from 3D-IRCADb with the same number of volumes as others; c.f., Table 2. Results show that our model significantly outperforms existing approaches in terms of accuracy and specificity.
Regarding sensitivity, our model is superior to others across all the cases, exhibiting an average lift of 2% when compared to the existing leading model.Notably, both sensitivity and DSC can be substantially influenced by the quality of reference masks and predictive accuracy.After carefully checking the labels of 3D-IRCADb, we have found a considerable portion of labels that are incorrectly masked.Taking Fig. 3, there have many over-labeled, under-labeled, and even wrongly-labeled masks.Since the number of vessel pixels is significantly smaller than that of non-vessel pixels, it is more sensitive to imperfect labels, thus the significant fluctuation of sensitivity.Note that, to ensure a fair comparison, we adhere to the standard settings for the number of testing volumes used in existing approaches: 1, 8, 14, and 20 volumes, respectively.The performance of each volume is evaluated using the five-fold cross-validation, and the results for k collective volumes are averaged on the k top-performed volumes.

Performance comparison with deep learning models
The proposed model is trained using multiple filters, necessitating only a small amount of labeled data intrinsically.Nonetheless, it may be less effective compared to deep learning-based models that are capable of automatic feature learning.To assess the efficacy of our proposed model, we evaluate its performance against state-of-the-art deep learning models, including U-Net [38], TransUNet [39], and 3D U-Net [40].Detailed results presented in Table 2 reveal that our model is slightly inferior to U-Net but notably superior over TransUNet and 3D U-Net.We speculate that this discrepancy is primarily due to the increased parameters in the latter two models, particularly in the case of 3D U-Net.

Larger context improves segmentation
Different window sizes, i.e., 1, 3, 5 and 7, are used to capture the context information for vessel segmentation.To explore the impact of the context within a slice and between slices, we have considered the 2D and 3D scenarios.The performance of our model on 3D-IRCADb with various context window sizes are shown in Table 3.Clearly, a larger window of context consistently generates better segmentation results.
Figure 4 shows two examples of vessel segmentation with various window sizes.It can be observed that a larger window size generates complete internal regions and smoother edges of vessels.In contrast, small window size is prone to yield more isolated pixels or regions.Besides, the results obtained from 3D voxels are more tolerant to weakly connected regions between vessels than that generated from the 2D pixels.
It is essential to note that a larger voxel size does not always translate to better performance; see Table 3.This is due to the reduced influence of long-distance pixels on the central voxel of interest.Additionally, increasing voxel size substantially enlarges the feature dimension, potentially leading to issues such as the curse of dimensionality.

Markov random field refines segmentation
Although context information has been appended to the model of vessel segmentation, each pixel is predicted separately.Thus the connections of vessels in more extensive ranges are not captured.To this end, we borrow the MRF [35] model with a revised energy function to sharpen the distinction between vessels and non-vessels.The MRF-aware results improve the dice value by 3.1% on average for the 3D-IRCADb dataset (p-value < 2.2e − 16 ); see Fig. 5.
To demonstrate the improvements of the MRF model, we present six representative examples in Fig. 6.It is clear that the revised MRF model is able to remove isolated pixels or smaller regions, fill the holes in vessel regions, and bridge the gaps between separated vessel segments.

Association between critical filters and context
In this study, 22 filters are used to capture vessels' information from various perspectives to compensate for the lack of data.However, not all filters are of equal importance to the model.To examine the association between the filters and the context size, we have retrieved the filters selected by XGBoost; see Table 4. Interestingly, only CLAHE, Gabor and Hessian are persistently important to the 2D-wise vessel segmentation.At the same time, most filters are kept for the 3D situation except a few presented in the Pillow package (the details are shown in Table 4).In addition, more filters are used in case the context range is more extensive.These observations consolidate our proposal of using multiple filters with broad context to segment vessels.

Conclusion
Liver vessel segmentation is essential for clinical liver disease diagnosis and treatment.Hence great efforts have been made to solve this problem from the computational perspective.However, the performance of existing models is still far from satisfactory.The main reasons hindering vessel segmentation progress include small size, heavy noise, low contrast, and irregular shape.These difficulties further prevent the construction of large-volume and high-quality vessel segmentation data, making the computational models significantly under-fitted, particularly for deep learning models.To overcome the limitations, we propose a rich filter-based model to compensate for the scarcity of labeled data, of which the results are further refined by a Markov random field model.Experiments show that the proposed model significantly improves vessel segmentation without complicated models and extensive data.This study unveils that rich irrelevant filters are helpful for tasks having limited data, like vessel segmentation.

Table 4 Important filters to vessel segmentation with various context ranges a 'PL' represents the Pillow package
The index 'i' ( i ∈{-3, -2, -1, 0, 1, 2, 3}) indicates the position of a slice compared to the interest one (always marked as '0') with negative the before and positive the behind.For the 2D situation, only one slice is presented, thus no such index is available

Fig. 2
Fig. 2 Diagram of the proposed liver vessel segmentation model.It composes vessel enhancement, candidate generation, and segmentation refinement.Vessel enhancement is achieved by fading out the background but strengthening boundary regions, candidate vessels are obtained by XGBoost feeding with features generated from extensive image filters, and refinement is fulfilled by a refined Markov random field

2 Fig. 3
Fig. 3 Examples of imperfect vessel labels.The red boxes highlight over-labeled, under-labeled, and wrongly-labeled masks

Fig. 4
Fig.4 Vessel segmentation results obtained from various context ranges.The pixels in white are correctly predicted, the red are over predicted (i.e., false positive) and the green are under predicted (i.e., false negative).The mark " i × j × k " on a slice indicates the voxel size, where i = 1 means the context in 2D, otherwise 3D

Fig. 5 Fig. 6
Fig. 5 Performance comparison between MRF-aware and MRF-agnostic results.Note only the distribution of dice coefficient and sensitivity are shown here as others are very close to 1 that lose distinguishability

Table 3
Segmentation performance of our proposed model on 3D-IRCADb under various voxel sizeResults are obtained by five-fold cross-validation